NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Structure-based out-of-distribution (OOD) materials property prediction: a benchmark study

https://doi.org/10.1038/s41524-024-01316-4

Omee, Sadman_Sadeed; Fu, Nihang; Dong, Rongzhi; Hu, Ming; Hu, Jianjun (July 2024, npj Computational Materials)

Abstract In real-world materials research, machine learning (ML) models are usually expected to predict and discover novel exceptional materials that deviate from the known materials. It is thus a pressing question to provide an objective evaluation of ML model performances in property prediction of out-of-distribution (OOD) materials that are different from the training set. Traditional performance evaluation of materials property prediction models through the random splitting of the dataset frequently results in artificially high-performance assessments due to the inherent redundancy of typical material datasets. Here we present a comprehensive benchmark study of structure-based graph neural networks (GNNs) for extrapolative OOD materials property prediction. We formulate five different categories of OOD ML problems for three benchmark datasets from the MatBench study. Our extensive experiments show that current state-of-the-art GNN algorithms significantly underperform for the OOD property prediction tasks on average compared to their baselines in the MatBench study, demonstrating a crucial generalization gap in realistic material prediction tasks. We further examine the latent physical spaces of these GNN models and identify the sources of CGCNN, ALIGNN, and DeeperGATGNN’s significantly more robust OOD performance than those of the current best models in the MatBench study (coGN and coNGN) as a case study for the perovskites dataset, and provide insights to improve their performance.
more » « less
Materials synthesizability and stability prediction using a semi-supervised teacher-student dual neural network

https://doi.org/10.1039/d2dd00098a

Gleaves, Daniel; Fu, Nihang; Dilanga Siriwardane, Edirisuriya M.; Zhao, Yong; Hu, Jianjun (April 2023, Digital Discovery)

Data driven generative deep learning models have recently emerged as one of the most promising approaches for new materials discovery. While generator models can generate millions of candidates, it is critical to train fast and accurate machine learning models to filter out stable, synthesizable materials with the desired properties. However, such efforts to build supervised regression or classification screening models have been severely hindered by the lack of unstable or unsynthesizable samples, which usually are not collected and deposited in materials databases such as ICSD and Materials Project (MP). At the same time, there is a significant amount of unlabelled data available in these databases. Here we propose a semi-supervised deep neural network (TSDNN) model for high-performance formation energy and synthesizability prediction, which is achieved via its unique teacher-student dual network architecture and its effective exploitation of the large amount of unlabeled data. For formation energy based stability screening, our semi-supervised classifier achieves an absolute 10.3% accuracy improvement compared to the baseline CGCNN regression model. For synthesizability prediction, our model significantly increases the baseline PU learning's true positive rate from 87.9% to 92.9% using 1/49 model parameters. To further prove the effectiveness of our models, we combined our TSDNN-energy and TSDNN-synthesizability models with our CubicGAN generator to discover novel stable cubic structures. Out of the 1000 recommended candidate samples by our models, 512 of them have negative formation energies as validated by our DFT formation energy calculations. Our experimental results show that our semi-supervised deep neural networks can significantly improve the screening accuracy in large-scale generative materials design. Our source code can be accessed at https://git/hub.com/usccolumbia/tsdnn.
more » « less
Full Text Available
Material transformers: deep learning language models for generative materials design

https://doi.org/10.1088/2632-2153/acadcd

Fu, Nihang; Wei, Lai; Song, Yuqi; Li, Qinyang; Xin, Rui; Omee, Sadman Sadeed; Dong, Rongzhi; Siriwardane, Edirisuriya M; Hu, Jianjun (January 2023, Machine Learning: Science and Technology)

Abstract Pre-trained transformer language models (LMs) on large unlabeled corpus have produced state-of-the-art results in natural language processing, organic molecule design, and protein sequence generation. However, no such models have been applied to learn the composition patterns for the generative design of material compositions. Here we train a series of seven modern transformer models (GPT, GPT-2, GPT-Neo, GPT-J, BLMM, BART, and RoBERTa) for materials design using the expanded formulas of the ICSD, OQMD, and Materials Projects databases. Six different datasets with/out non-charge-neutral or EB samples are used to benchmark the generative design performances and uncover the biases of modern transformer models for the generative design of materials compositions. Our experiments show that the materials transformers based on causal LMs can generate chemically valid material compositions with as high as 97.61% to be charge neutral and 91.22% to be electronegativity balanced, which has more than six times higher enrichment compared to the baseline pseudo-random sampling algorithm. Our LMs also demonstrate high generation novelty and their potential in new materials discovery is proved by their capability to recover the leave-out materials. We also find that the properties of the generated compositions can be tailored by training the models with selected training sets such as high-bandgap samples. Our experiments also show that different models each have their own preference in terms of the properties of the generated samples and their running time complexity varies a lot. We have applied our materials transformers to discover a set of new materials as validated using density functional theory calculations. All our trained materials transformer models and code can be accessed freely at http://www.github.com/usccolumbia/MTransformer .
more » « less
Full Text Available
Physics guided deep learning for generative design of crystal materials with symmetry constraints

https://doi.org/10.1038/s41524-023-00987-9

Zhao, Yong; Siriwardane, Edirisuriya M. Dilanga; Wu, Zhenyao; Fu, Nihang; Al-Fahdi, Mohammed; Hu, Ming; Hu, Jianjun (March 2023, npj Computational Materials)

Abstract Discovering new materials is a challenging task in materials science crucial to the progress of human society. Conventional approaches based on experiments and simulations are labor-intensive or costly with success heavily depending on experts’ heuristic knowledge. Here, we propose a deep learning based Physics Guided Crystal Generative Model (PGCGM) for efficient crystal material design with high structural diversity and symmetry. Our model increases the generation validity by more than 700% compared to FTCP, one of the latest structure generators and by more than 45% compared to our previous CubicGAN model. Density Functional Theory (DFT) calculations are used to validate the generated structures with 1869 materials out of 2000 are successfully optimized and deposited into the Carolina Materials Databasewww.carolinamatdb.org, of which 39.6% have negative formation energy and 5.3% have energy-above-hull less than 0.25 eV/atom, indicating their thermodynamic stability and potential synthesizability.
more » « less
DeepXRD, a Deep Learning Model for Predicting XRD spectrum from Material Composition

https://doi.org/10.1021/acsami.2c05812

Dong, Rongzhi; Zhao, Yong; Song, Yuqi; Fu, Nihang; Omee, Sadman Sadeed; Dey, Sourin; Li, Qinyang; Wei, Lai; Hu, Jianjun (September 2022, ACS Applied Materials & Interfaces)

Full Text Available
TCSP: a Template-Based Crystal Structure Prediction Algorithm for Materials Discovery

https://doi.org/10.1021/acs.inorgchem.1c03879

Wei, Lai; Fu, Nihang; Siriwardane, Edirisuriya M.; Yang, Wenhui; Omee, Sadman Sadeed; Dong, Rongzhi; Xin, Rui; Hu, Jianjun (June 2022, Inorganic Chemistry)

Full Text Available
Scalable deeper graph neural networks for high-performance materials property prediction

https://doi.org/10.1016/j.patter.2022.100491

Omee, Sadman Sadeed; Louis, Steph-Yves; Fu, Nihang; Wei, Lai; Dey, Sourin; Dong, Rongzhi; Li, Qinyang; Hu, Jianjun (May 2022, Patterns)

Full Text Available
Crystal Composition Transformer: Self‐Learning Neural Language Model for Generative and Tinkering Design of Materials

https://doi.org/10.1002/advs.202304305

Wei, Lai; Li, Qinyang; Song, Yuqi; Stefanov, Stanislav; Dong, Rongzhi; Fu, Nihang; Siriwardane, Edirisuriya_M_D; Chen, Fanglin; Hu, Jianjun (August 2024, Advanced Science)

Abstract Self‐supervised neural language models have recently achieved unprecedented success from natural language processing to learning the languages of biological sequences and organic molecules. These models have demonstrated superior performance in the generation, structure classification, and functional predictions for proteins and molecules with learned representations. However, most of the masking‐based pre‐trained language models are not designed for generative design, and their black‐box nature makes it difficult to interpret their design logic. Here a Blank‐filling Language Model for Materials (BLMM) Crystal Transformer is proposed, a neural network‐based probabilistic generative model for generative and tinkering design of inorganic materials. The model is built on the blank‐filling language model for text generation and has demonstrated unique advantages in learning the “materials grammars” together with high‐quality generation, interpretability, and data efficiency. It can generate chemically valid materials compositions with as high as 89.7% charge neutrality and 84.8% balanced electronegativity, which are more than four and eight times higher compared to a pseudo‐random sampling baseline. The probabilistic generation process of BLMM allows it to recommend materials tinkering operations based on learned materials chemistry, which makes it useful for materials doping. The model is applied to discover a set of new materials as validated using the Density Functional Theory (DFT) calculations. This work thus brings the unsupervised transformer language models based generative artificial intelligence to inorganic materials. A user‐friendly web app for tinkering materials design has been developed and can be accessed freely atwww.materialsatlas.org/blmtinker.
more » « less

Search for: All records